Cross-lingual keyword assignment
نویسنده
چکیده
This paper presents a languageindependent approach to controlled vocabulary keyword assignment using the EUROVOC thesaurus. Due to the multilingual nature of EUROVOC, the keywords for a document written in one language can be displayed in all eleven official European Union languages. The mapping of documents written in different languages to the same multilingual thesaurus furthermore allows cross-language document comparison. The assignment of the controlled vocabulary thesaurus descriptors is achieved by applying a statistical method that uses a collection of manually indexed documents to identify, for each thesaurus descriptor, a large number of lemmas that are statistically associated to the descriptor. These associated words are then used during the assignment procedure to identify a ranked list of those EUROVOC terms that are most likely to be good keywords for a given document. The paper also describes the challenges of this task and discusses the achieved results of the fully functional prototype.
منابع مشابه
Cross-Lingual Information to the Rescue in Keyword Extraction
We introduce a method that extracts keywords in a language with the help of the other. In our approach, we bridge and fuse conventionally irrelevant word statistics in languages. The method involves estimating preferences for keywords w.r.t. domain topics and generating cross-lingual bridges for word statistics integration. At run-time, we transform parallel articles into word graphs, build cro...
متن کاملExploiting Knowledge Bases for Multilingual and Cross-lingual Semantic Annotation and Search
The amount of entities in large knowledge bases (KBs) has been increasing rapidly, making it possible to propose new ways of intelligent information access. In addition, there is an impending need for systems that can enable multilingual and cross-lingual information access. In this work, we firstly demonstrate X-LiSA, an infrastructure for multilingual and cross-lingual semantic annotation, wh...
متن کاملKeyword Translation Accuracy And Cross-Lingual Question Answering InChinese And Japanese
In this paper, we describe the extension of an existing monolingual QA system for English-to-Chinese and English-toJapanese cross-lingual question answering (CLQA). We also attempt to characterize the influence of translation on CLQA performance through experimental evaluation and analysis. The paper also describes some language-specific issues for keyword translation in CLQA.
متن کاملA Knowledge Base Approach to Cross-Lingual Keyword Query Interpretation
The amount of entities in large knowledge bases available on the Web has been increasing rapidly, making it possible to propose new ways of intelligent information access. In addition, there is an impending need for technologies that can enable cross-lingual information access. As a simple and intuitive way of specifying information needs, keyword queries enjoy widespread usage, but suffer from...
متن کاملPredicting Influential Cross-lingual Information Cascades on Twitter
Social network services (SNSs) have become new global and multilingual information platforms due to their popularity. In SNSs with content-sharing functionality, such as "retweets'' in Twitter and "share'' in Facebook, posts are easily and quickly shared among users, and some grow into large information cascades. Accompanied with such growth, cascades can spread over regions and languages. The ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Procesamiento del Lenguaje Natural
دوره 27 شماره
صفحات -
تاریخ انتشار 2001